Automatic Numbers Normalization in Inflectional Languages

نویسندگان

  • Jakub Kanis
  • Jan Zelinka
چکیده

This paper is devoted to the text normalization module in our text-to-speech synthesis system. We focused on conversion numerals written as figures into a readable full-length form. The numerals conversion is a significant issue in inflectional language as Czech, Russian or Slovak because morphological and semantic information is necessary to make the conversion unambiguous. In the paper three part-of-speech tagging methods are compared. Furthermore, a method reducing the tagset to increase the numerals conversion accuracy is presented in the paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Stemming Approach Using HMM for a Highly Inflectional Language

Stemming is a common method for morphological normalization of natural language texts. Modern information retrieval systems rely on such normalization techniques for automatic document processing tasks. High quality stemming is difficult in highly inflectional Indic languages. Little research has been performed on designing algorithms for stemming of texts in Indic languages. In this study, we ...

متن کامل

Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms

Wiktionary is a large-scale resource for cross-lingual lexical information with great potential utility for machine translation (MT) and many other NLP tasks, especially automatic morphological analysis and generation. However, it is designed primarily for human viewing rather than machine readability, and presents numerous challenges for generalized parsing and extraction due to a lack of stan...

متن کامل

Constructional Potentiality: Priscianic grammar as a disambiguation technique in the automatic recognition of Latin syntax

technique in the automatic recognition of Latin syntax In most languages word order plays the major role in determining which words form a single phrase or constitute. A tree s£ructure can be abstracted automatically from a sentence by linear determination of the major syntactic constitutes. However, in certain highly-inflected languages, of which Latin is perhaps the most extreme example, cons...

متن کامل

An Approach to Lexical Development for Inflectional Languages

We describe a method for the semi-automatic development of morphological lexicons. The method aims at using minimal pre-existing resources and only relies upon the existence of a raw text corpus and a database of inflectional classes. No lexicon or list of base forms is assumed. The method is based on a contrastive approach, which generates hypothetical entries based on evidence drawn form a co...

متن کامل

Automatic Identification of Learners' Language Background Based on Their Writing in Czech

The goal of this study is to investigate whether learners’ written data in highly inflectional Czech can suggest a consistent set of clues for automatic identification of the learners’ L1 background. For our experiments, we use texts written by learners of Czech, which have been automatically and manually annotated for errors. We define two classes of learners: speakers of Indo-European languag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005